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ABSTRACT: While Building Information Modelling (BIM) can support the management and visualisation of 
construction projects, Augmented Reality (AR) holds great promise to enhance interaction with these complex 
models. The accurate positioning of BIM-AR models in construction sites is critical to ensure that the virtual and 
real-world environments are correctly aligned. Through a literature review, this paper presents a review of state- 
of-the-art positioning techniques. It explores the different techniques used to position BIM-AR models and 
understands the interconnections and differences between them, with an emphasis on their applicability to the 
construction industry. The review also explores the challenges and limitations of each technique, in terms of the 
trade-offs between accuracy, computational efficiency, and robustness in varying environments. By providing an 
overview of positioning techniques in BIM-AR, this paper aims to guide researchers and practitioners in assessing 
the suitability of these techniques in the context of construction sites. The insights gained from this review may 
inform the development of efficient BIM-AR platforms that are more aligned with the dynamic and complex nature 
of construction sites. 
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1. INTRODUCTION 


The construction industry is constantly looking for innovations and new methods to improve collaboration and 
productivity (Schiavi et al., 2022). Building Information Modelling (BIM) is a recent innovation in information 
systems that has proven its value for the construction industry. Currently, the use of BIM is a common practice in 
the built environment (Amin & Abanda, 2019), although its use in the construction stage is limited (Nassereddine 
et al., 2022; Sidani et al., 2021). Recent advancements in immersive visualisation technologies have created new 
prospects for exploring the potential of site-based BIM settings. The research on immersive visualisation 
technologies such as Augmented Reality (AR) and Virtual Reality (VR) has been growing with the aim of 
improving collaboration, productivity, and output quality of construction projects (Schiavi et al., 2022). In 
particular, the integration between BIM and AR can bridge the gap between the site and the office by enabling 
access to the BIM models onsite (Schiavi et al., 2022; Sidani et al., 2021). The potential of BIM-based AR 
integration is attributed to the potential improvements in collaboration and onsite information retrieval and 
representation (Wang et al., 2014). However, the majority of studies tend to investigate general AR applications 
that do not depend on the utilisation of BIM. The implementation of BIM-AR depends on complex software 
architecture and sophisticated positioning techniques which are not necessarily needed in general AR applications 
(Amin et al., 2023). The focus on BIM-AR should provide a deeper understanding of the specific benefits and 
limitations of the technology in the context of the practical requirements of real-life situations. 


In the context of BIM-AR in the construction stage, the accurate positioning of 3D models onsite remains a major 
challenge and remains one of the most active research subjects (Azuma et al., 2001; Servières et al., 2021; Van 
Krevelen & Poelman, 2010). Positioning refers to the system’s ability to accurately localise and track the BIM 
model with the proper alignment, orientation, and elevation (Amin et al., 2023). Numerous studies have explored 
a multitude of positioning techniques, employing different hardware and software components (Nee & Ong, 2023). 
The choice of the suitable technique is usually driven by a trade-off between accuracy, computational efficiency, 
and the region of space in which the system should work properly (Rolland et al., 2001; Servières et al., 2021). To 
decide whether a specific positioning technique is more effective for a specific use case, it is important to have a 
global understanding of the enabling technologies of positioning. In addition, it is important to explore how the 
effective management of positioning BIM-AR models in construction sites can have implications on the existing 
responsibilities and skillset of existing BIM roles. Hence, we adopt a literature review to survey state-of-the-art 
positioning techniques in the construction stage and develop a better understanding of their uses and limitations in 
the context of construction sites. 


The motivation is to gain a comprehensive overview of the various positioning techniques in BIM-AR to capture 
the nuances and interconnections between them. This should help better understand their uses and limitations in 
the context of the dynamic and complex nature of construction sites. Such an understanding should provide insights 
into the development of BIM-AR platforms that are tailored to meet the demands of such challenging environments. 
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In addition, we discuss how the effective management of positioning BIM-AR models in construction sites requires 
revisiting the existing structure of BIM roles including prospects for new responsibilities and skillsets. 


2. LITERATURE REVIEW 


AR encompasses a wide range of positioning technologies and methods, each with its own set of intricacies and 
considerations. Capturing the subtle differences and interconnections of these techniques requires a global 
understanding of Computer Vision, sensor technologies, and algorithm development. Positional tracking systems 
can be grouped under Outside-In and Inside-Out (Gourlay & Held, 2017). Outside-In systems utilise external 
stationary sensors or cameras (trackers) to track feature points -such as light emitters- that are mounted or 
assembled into the tracked device (Gourlay & Held, 2017; Pustka et al., 2012). The main drawback of Outside-In 
systems is that the accuracy and stability of tracking are limited by the space the trackers can cover (Gourlay & 
Held, 2017). On the other hand, Inside-Out, the system uses the cameras and sensors that are assembled in the 
device to map the environment and estimate its local pose (Figure 1). It is believed that Inside-Out systems have 
an advantage over Outside-In in AR because the former requires less environmental setup and enables a more 
dynamic experience (Gourlay & Held, 2017). Inside-Out is the dominant positioning technique used in 
smartphones and modern AR headsets. However, grouping positioning techniques of AR into inside-out and 
outside-in only partially describes the vast pool of approaches, hardware and software components used. It is more 
common to group AR positioning techniques under three categories based on the type of sensors: sensor-based, 
vision-based and hybrid (Amin et al., 2023). This classification is adopted by others (Rolland et al., 2001; Zhou et 
al., 2008), however, positioning techniques need to be understood with a wider BIM-AR function focus (Amin et 
al., 2023). Williams et al. (2014) provide an important application case study, which we update and extend in this 
study. This study expands on the systematic literature review in Amin et al. (2023) to develop and update a 
comprehensive map of BIM-AR positioning techniques in the construction stage. We provide a detailed description 
of each category and develop an understanding of the interconnections and differences between them in the context 
of the nature and requirements of construction sites. 


Figure | Inside-out systems rely on a group of cameras and/or sensors manufactured into the headsets. These 
systems do not depend on any external sensory information. 


3. RESULTS 
3.1 BIM-AR Positioning Techniques 


Regardless of the selected technique, any BIM-AR positioning system will need to do two tasks: estimate the local 
pose (location and orientation) of the user and construct a map of the surrounding environment (Wang et al., 2014). 
This happens through a two-stage process: a learning stage and a tracking stage. The learning stage comprises 
understanding the surrounding environment and recognising its features to create a spatial map that serves as a 
foundation for accurate tracking (Nee & Ong, 2023). The tracking stage is where the system initialises the 
coordinate system, localises the model in six degrees of freedom (6DOF), and monitors the changes to its location 
and orientation relative to the environment (Choi & Park, 2021; Zhou et al., 2008). To achieve accurate positioning 
of BIM-AR models, many techniques and approaches have been developed. Positioning techniques in BIM-AR 
can be grouped under three categories: sensor-based, vision-based and hybrid (Azuma et al., 2001; Billinghurst et 
al., 2015; Palmarini et al., 2018; Servières et al., 2021). Manual mapping is an additional technique that is not 
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frequently mentioned in the literature due to its limitations (Amin et al., 2023). The dependency map in Figure 3 
shows the different positioning techniques used in BIM-AR and the interconnections among them. The next 
subsections discuss how the technology works in each category in detail and describe the associated techniques 
and their limitations. 


3.1.1 Sensor-based systems 


Sensor-based tracking refers to the process of determining the position and orientation of a user or device within 
a real-world environment by utilizing various sensors (Rolland et al., 2001, Williams et al., 2014). Compared to 
vision-based tracking methods, sensor-based tracking is faster and more robust in determining the pose of the 
device, however, they are analogous to open-loop systems whose output could accumulate errors (Zhou et al., 
2008). Several types of sensors are commonly used in sensor-based tracking in BIM-AR: 


1. Inertial Sensors: Inertial sensors are commonly used in tracking systems for BIM-AR, usually as 
complementary sensors to visual ones. The most common inertial sensor used for pose estimation is 
Inertial Measurement Unit (IMU) (Nee and Ong, 2023). IMUs can provide accurate information for all 
six degrees of freedom about the pose of the device, usually by fusing information from integrated 
gyroscope, accelerometer and magnetometer (Ahmad et al., 2013). While inertial sensors provide high 
accuracy in short-term tracking, they suffer from error accumulation over time and need to be combined 
with other sensors for accurate tracking (Rolland et al., 2001, Williams et al., 2014, Nee and Ong, 2023). 
IMUs have become an essential component in all smartphones and modern AR headsets. 


2. Laser-based Depth Sensors: also referred to as optical sensors, utilise different kinds of light in the 
infrared spectrum to measure depth information to understand the geometry of objects and surfaces in 
real-time, also known as depth sensors. Among several types of depth sensors, the Time-of-Flight (ToF) 
and Light Detection and Ranging (LiDAR) are the most commonly used techniques in BIM-AR (Amin 
et al., 2023). A ToF laser sensor emits an infrared laser beam and measures the time it takes to reflect 
back to measure the distance to an object (Rolland et al., 2001, Williams et al., 2014). LiDAR scanners 
are usually more expensive because they can cover larger areas and provide higher accuracy. 


3. GPS: is a widely used sensor for outdoor localization. It leverages a network of satellites to determine 
the latitude, longitude, and sometimes altitude of a device. GPS enables location-based AR where digital 
elements are superimposed based on where the user stands as in Williams et al. (2014). Very few studies 
have utilised GPS to position BIM-AR models (Fenais et al., 2018; Williams et al., 2014) due to its low 
accuracy and low performance indoors. 


4. Wireless Network Sensing: such as Wi-Fi or Bluetooth. They can be utilised for determining the location 
of the device but have significant limitations related to setup, coverage and accuracy (Craig, 2013, 
Williams et al., 2014). A single study utilised Wi-Fi for BIM-AR positioning (Degani et al., 2019). 


Other sensors that are frequently mentioned in general AR literature but are not used in BIM-AR in the construction 
stage are magnetic sensors and acoustic sensors. Magnetic Sensors, also known as magnetometers, detect changes 
in the Earth's magnetic field to determine the orientation of the device. Due to the existence of magnetic fields 
apart from the earth's soft iron effect and temperature changes, magnetometers are highly susceptible to magnetic 
disturbances. As a result, their use is often disregarded in various applications, particularly in industrial settings as 
construction projects (Rolland et al., 2001, Nee and Ong, 2023). Acoustic Sensors utilise the principle of ToF used 
in optical sensors but use sound waves instead of laser beams. The speed of sound varies with environmental 
conditions and sound waves can be easily obstructed, so acoustic sensors are not a reliable tracking technique 
(Rolland et al., 2001, Nee and Ong, 2023). It is argued that the sole reliance on sensor-based techniques would 
introduce significant error variables (Craig, 2013). This is due to some requirements that are not always available 
on construction sites such as network coverage, and due to their sensitivity to some environmental conditions such 
as temperature, humidity, and noise. In addition, measurement errors are accumulated over time and need 
continuous calibration because pose estimation is evaluated based on the previous position (Craig, 2013). 


3.1.2 Vision-based systems 


Vision-based techniques rely on different computer vision methods to locate and track targets within a video 
sequence or a series of images (Jinyu et al., 2019; Serviéres et al., 2021; Williams et al., 2014; Zhou et al., 2008). 
A major advantage of vision-based techniques is that they rely on cameras which provide an affordable solution 
to capture lots of information, in addition to being available in many forms and types (Song & Kook, 2022; Yang 
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et al., 2023). Vision-based tracking techniques use image processing methods to estimate the pose of the camera 
relative to the real world and so are analogous to closed-loop systems which correct errors dynamically (Zhou et 
al., 2008). However, they suffer at higher movement speeds and are dependent on the uncontrolled condition of 
the environment they are operating in such as scene complexity, lighting, and weather (Servieres et al., 2021; Yu 
et al., 2016; Zhou et al., 2008). In addition, because vision-based techniques rely on recognising and tracking 
visual cues within the surroundings, it becomes challenging in an unknown environment as the system takes more 
time to collect enough data to analyse the surroundings to deduce the user’s pose (Servières et al., 2021; Siltanen, 
2012). To overcome this challenge, predefined signs (markers) that are easily detectable by the visual tracking 
system can be placed in the environment. This approach is called marker-based, where the marker is used as the 
reference for the positioning system to superimpose the virtual objects onto the real world (Nee & Ong, 2023; 
Siltanen, 2012; Williams et al., 2014). However, a major drawback of marker-based techniques is that tracking 
requires the markers to be always visible in the coverage area of the camera for a stable experience (Nee & Ong, 
2023). A multi-marker tracking method that involves distributing a group of markers for the camera to detect to 
expand the coverage area has been developed. Yet, marker-based approaches are generally considered obtrusive 
and can be easily obstructed in construction sites (Song & Kook, 2022; Wang et al., 2013). 


To overcome the limitations of marker-based tracking, markerless tracking can recognise and track natural features 
in the environment such as edges, corners, and textures, and use them as location references to overlay virtual 
elements and determine the pose of the user (Nee & Ong, 2023; Servières et al., 2021; Siltanen, 2012). Markerless 
tracking offers several advantages, such as flexibility and ease of use since it doesn't require physical markers. 
However, because the term “marker-based” implies the need to do some preparations in the environment before 
initialising the coordinate system, the term “markerless” implies a misleading perception that it can work anywhere 
without previous preparations. Globally, markerless tracking is often perceived as a universally capable technology, 
however, its practical implementation presents numerous challenges in terms of computational requirements, noise 
reduction, and user-friendly interaction techniques (Servières et al., 2021). The majority of research in BIM-AR 
adopts either a marker-based or a markerless approach, overlooking “extended tracking” approaches which allow 
tracking of digital elements to persist in the user’s field of view even when the initial target is no longer in the 
frame of the camera (Vuforia.Com, 2023; Wikitude, 2023) 


Differences among vision-based techniques can then be divided into model-based and Visual SLAM 
(Simultaneous Localisation and Mapping), also referred to as V-SLAM (Figure 3). While both techniques depend 
on feature tracking and matching between a series of images, the difference mainly lies in the system’s knowledge 
of what it should track. Model-based systems use pre-existing information about the environment. In other words, 
the system has prior knowledge of what it will track which can be fiducial 2D features such as images and QR 
codes (marker-based approach), or a group of edges, corners, and textures that define a 3D object (object tracking 
approach) (Palmarini et al., 2018; Siltanen, 2012). In contrast, V-SLAM systems gradually reconstruct their 
environment while tracking the user’s pose (Nee & Ong, 2023; Yang et al., 2023). V-SLAM techniques do not 
have prior knowledge of what to track, and so they continuously create information about surroundings utilising 
an “ad-hoc” visual tracking method (Palmarini et al., 2018; Serviéres et al., 2021). V-SLAM relies on principles 
from “Structure from Motion” (SfM) to create a 3D structure of an unknown environment and then expands by 
incorporating the aspect of real-time pose estimation through a set of algorithms that optimise computational 
efficiency (Nee & Ong, 2023; Yang et al., 2013). However, because the environment map is required for the pose 
estimation and vice versa the main challenge of V-SLAM approaches is the accumulation of small errors in the 
estimated poses which can lead to larger errors in the map information, etc. The development of hybrid systems, 
that fuse different kinds of sensory information, was designed to create more accurate results. 
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Figure 2 Interconnections between vision-based positioning techniques 
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SECTION A - EXTENDED REALITY TECHNOLOGIES IN CONSTRUCTION 


3.1.3 Hybrid systems 


Hybrid systems fuse information from different sensors and cameras to compensate for the limitations of each 
technique. In particular, the fusion of visual tracking systems and inertial sensors has gained significant popularity. 
Cameras excel in providing precise measurements by leveraging visual feature matching and multi-view geometry. 
However, in scenarios where image quality is compromised due to factors like rapid motion or sudden changes in 
lighting, purely visual tracking systems often encounter failures (Nee & Ong, 2023; Williams et al., 2014). In 
contrast, inertial sensors remain unaffected by image quality issues and demonstrate particular proficiency in 
tracking high-frequency, fast motion. Nevertheless, the measurements obtained from an IMU are subject to high 
noise levels and drift over time (Nee & Ong, 2023; Zhou et al., 2008). By fusing visual information from cameras 
and inertial information from IMUs, the system is more able to dynamically correct errors and provide more 
accurate results in constructing a 3D map of the environment while estimating the pose of the device. This is 
known as Visual-Inertial SLAM (VI-SLAM) which is the technology used for fusing the information from the 
different sensors for environment mapping and 3D pose estimation (Jinyu et al., 2019). VI-SLAM can be 
considered a subset of multi-sensor fusion techniques. Multi-sensor fusion is a broader concept that encompasses 
the integration of data from multiple sensors, which can include cameras, inertial sensors, GPS, LiDAR and more 
(Figure 3). Challenging aspects in hybrid systems are the need to perform calibration between the cameras and 
sensors to ensure that their measurements are aligned within a shared coordinate system, a process commonly 
referred to as hand-eye calibration (Nee & Ong, 2023). 
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Figure 3 A comprehensive map of the positioning techniques used in BIM-based AR 


3.1.4 Manual Mapping 


To minimise the complexity of the dynamic real-time update of user movement relative to the surrounding 
environment, some studies used manual mapping techniques. Manual mapping involves using matrices to create 
a relation between world space and camera space and use this relation to overlay digital elements on physical 
objects (Amin et al., 2023). The world space represents the coordinate system that corresponds to the physical 
environment where objects have specific positions and orientations. On the other hand, the camera space refers to 
the coordinate system of the camera within an AR device which captures the real-world scene (Figure 4). By 
creating a similar situation using a virtual camera and a 3D model, the positioning system is then able to properly 
overlay and align digital elements from the 3D model by converting the coordinates from the world space to the 
camera space (Amin et al., 2023; Nee & Ong, 2023). Manual mapping techniques do not provide a dynamic 
experience as the user is restricted by the location of the camera. Few studies have experimented with this 
technique as in Dai and Lu (2010), Lin et al. (2020) and Gomez-Jauregui et al. (2019). 
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Figure 4 Demonstration of the concept of manual mapping from Lin et al. (2020) 
3.2 Dominant techniques and devices 


BIM-AR is used for four main applications in the construction stage: site assistance, construction planning, 
progress tracking, and inspection. Figure 5 shows the techniques used in the mentioned applications. Various 
methods and techniques are used across applications with the exception of LiDAR. And so, the development of 
BIM-AR platforms that are capable of supporting different positioning techniques may be required. In addition, 
more research is needed on the effectiveness of these techniques from a practitioner perspective. While several 
studies have been carried out in real-world construction sites, the focus has been on the technological aspects of 
BIM-AR not on the practical implementation of the technology. 
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Figure 5 Positioning techniques used in different construction applications 


4. DISCUSSION 


Common in BIM-AR positioning is the need to map anchor points between digital models and the physical 
environment. The positioning system will always require a physical reference in the real world that can be mapped 
to a reference in the digital model. In BIM-AR applications that utilise natural features for positioning, users are 
usually asked to select a vertical wall edge and a horizontal edge representing a horizontal direction, then select 
the corresponding edges in the digital model. The same process occurs with artificial markers; the locations of the 
physical markers are mapped to digital ones in the digital mode. The dynamic nature of construction sites requires 
that the locations of these anchor points will perhaps change frequently as the project progresses. Studies that used 
marker-based indicated that in the context of construction sites, there is a considerable possibility that markers will 
get obstructed by other objects (Kwon et al., 2014; Lin et al., 2020). Studies that used natural feature tracking have 
argued that they could be obstructed due to other construction activities or disappear because site scenes keep 
changing (Lin et al., 2019; Mirshokraei et al., 2019). It is critical therefore to understand how construction activities 
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unfold and assess the accessibility of reference physical elements before selecting the most suitable positioning 
approach. The locations of anchor points need to be coordinated and updated accordingly to ensure continuous 
alignment between the BIM model and the evolving site conditions. 


The continuous coordination of anchor points between the physical site and the digital model necessitates revisiting 
the responsibilities of existing BIM roles. Existing BIM roles are usually oriented around generating and managing 
information across different stakeholders for design activities. We envision that a new responsibility will emerge 
to manage the accurate positioning of BIM-AR models onsite. This new responsibility, dedicated to BIM-AR 
coordination, will coordinate the anchor points between the site and the digital model and manage the 
communication with site personnel and their safety. Therefore, more collaboration between BIM and site 
professionals is needed to consider factors such as site constraints, logistics, and project stages. Adapting BIM- 
AR positioning techniques to the nature of construction sites is a crucial step in leveraging the technology. In 
addition, it is necessary to revisit the skills of BIM roles and workforce training programmes in the context of 
BIM-AR requirements. 


5. CONCLUSION 


We have provided insights into the positioning techniques used in BIM-based AR for construction projects. By 
exploring the different positioning techniques used in BIM-AR, their interconnections, and their differences, this 
study aimed to provide insights to researchers and practitioners for assessing the suitability of these techniques in 
a construction site context. The results of the review identified three main categories of positioning techniques: 
sensor-based, vision-based, and hybrid systems. Sensor-based systems utilise various sensors like IMUs, laser- 
based depth sensors, and GPS to track the position and orientation of the device. Vision-based techniques rely on 
computer vision methods and can be further categorised into model-based and V-SLAM approaches. Hybrid 
systems combine information from different sensors and cameras to compensate for the limitations of individual 
techniques. VI-SLAM is the core technology of multiple-sensor fusion. Additionally, manual mapping techniques 
were discussed, although they are not commonly used due to their limited dynamic capabilities. The review 
highlighted the challenges and limitations of each technique, such as accuracy, computational efficiency, and 
robustness in varying environments. It became noticeable that the choice of positioning technique depends on the 
specific requirements of construction applications, and there is no one-size-fits-all solution. Hence, we proposed 
guiding the research on BIM-AR to involve flexible positioning systems that can adopt more than one technique. 
In addition, the findings shed light on the need for continuous coordination of anchor points between the physical 
site and the digital model considering the evolving nature of construction sites. Consequently, a new responsibility, 
dedicated to BIM-AR coordination, will emerge to manage the positioning of BIM-AR models onsite and 
communicate with site personnel. This calls for more collaboration between BIM and site professionals, as well 
as revisiting the skills and training programs of existing BIM roles to accommodate BIM-AR requirements. 
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